Top-k Pattern Matching Using an Information-Theoretic Criterion over Probabilistic Data Streams

نویسندگان

  • Kento Sugiura
  • Yoshiharu Ishikawa
چکیده

As the development of data mining technologies for sensor data streams, more sophisticated methods for complex event processing are demanded. In the case of event recognition, since event recognition results may contain errors, we need to deal with the uncertainty of events. We therefore consider probabilistic event data streams with occurrence probabilities of events, and develop a pattern matching method based on regular expressions. In this paper, we first analyze the semantics of pattern matching over non-probabilistic data streams, and then propose the problem of top-k pattern matching over probabilistic data streams. We introduce the use of an information-theoretic criterion to select appropriate matches as the result of pattern matching. Then, we present an efficient algorithm to detect top-k matches, and evaluate the effectiveness of our approach using real and synthetic datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matching Top-k Answers of Twig Patterns in Probabilistic XML

The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. The top-k matching of a twig pattern against probabilistic XML data is essential. Some classical twig pattern algorithms can be adjusted to process the probabilistic XML. However, as far as finding answers of the top-k probabilities is concerned, the existing algorithms s...

متن کامل

CPR : Complex Pattern Ranking for Evaluating Top - k Pattern Queries over Event Streams

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plent...

متن کامل

Probabilistic Event Stream Processing with Lineage

Many sensor network applications such as the monitoring of video camera streams or the management of RFID data streams require the ability to detect composite events over high-volume data streams. Sensor data inputs from the physical world are usually noisy, incomplete and unreliable. Thus they are usually expressed with probability. To manage this kind of data, probabilistic event stream proce...

متن کامل

Ensemble-based Top-k Recommender System Considering Incomplete Data

Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...

متن کامل

Efficient Mining Top-k Regular-Frequent Itemset Using Compressed Tidsets

Association rule discovery based on support-confidence framework is an important task in data mining. However, the occurrence frequency (support) of a pattern (itemset) may not be a sufficient criterion for discovering interesting patterns. Temporal regularity, which can be a trace of behavior, with frequency behavior can be revealed as an important key in several applications. A pattern can be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017